This notebook aims to compile and annotate the R scripts used for the analysis of the Eudicot-Botrytis dataset.
This notebook charges R scripts that contain all the code. The main result figures are plotted directly in the notebook.
Eight Eudicot species were included in this dataset. This includes 7 crops and Arabidopsis thaliana as control. For the crops, six low improvement genotypes (wild, landraces) and six high improvement genotypes (cultivar, inbred lines) were tested. For each species, each genotype was inoculated with the collection of Botrytis isolates and replicated six time over two experiments.
Detached leaves were inoculated with Botrytis in ‘experimental trays’, that constitutes a micro-environment for a randomized collection of isolates. After 72h, pictures of all trays were taken. Image analysis for calculation of lesion area (and many other parameters) was conducted in R.
For image analysis R codes, see the Image_analysis_pipeline_Final R notebook.
Read the raw data, do small modifications on names and create variables.
Associated files: FULLMetaDat_7sp.txt
accession.list7sp.txt
Isolate_Host_origin_CORRECTED.txt
source('~/Desktop/Github_Eudicot/1.Creating_Dataset.R')
Calculate the mean, median, standard deviation, minimum, maximum of the lesions at different levels of the dataset. The length is the number of datapoints considered for the summary statistics. This is done at the taxon level, for each Botrytis isolate and for each plant genotype. It is also calculated for pairs of plant genotypes - isolates and taxon-isolates.
source('~/Desktop/Github_Eudicot/2.Summary_RawData.R')
ggplot(data=Summary_Isolate, aes(IsolateID, mean.iso, color=Origin)) +
geom_point()+
ylab("mean lesion size in cm2")+
ggtitle("Isolate variation colored by geographical origin")+
xlab("Botrytis Isolates")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(data=Summary_Isolate, aes(IsolateID, mean.iso, color=Host)) +
geom_point()+
ylab("mean lesion size in cm2")+
ggtitle("Isolate variation colored by original host")+
xlab("Botrytis Isolates")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(data=Summary_PGeno, aes(SNames, mean.geno, color=Taxon, shape=Order)) +
geom_point()+
ggtitle("Plant genotype variation")+
ylab("mean lesion size in cm2")+
xlab("Botrytis Isolates")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(data=Summary_IsoGen , aes(Isolate, mean.isogen)) +
geom_boxplot()+
ggtitle("Variation across plant genotype")+
ylab("mean lesion size in cm2")+
xlab("Botrytis Isolates")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(data=Summary_IsoTax , aes(Isolate, mean.isotax)) +
geom_boxplot()+
ggtitle("Variation across plant species")+
ylab("mean lesion size in cm2")+
xlab("Botrytis Isolates")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
This step remove the small size lesions that have high chance of being technical failure (for example the inoculation drops that did not contain Botrytis spores resulting in only the grape juice drop being phenotyped as lesion).
The pie charts provide further details to the table S5 about the lesion filtering and the percentage of lesion crossing the size and median thresholds.
Generated file: Eudicot7sp_clean_median.txt
source('~/Desktop/Github_Eudicot/3.Filtering_SmallLesionSize.R')
for (i in c(1:7)) {
tmp_selection <- as.data.frame(summary(splt_Data_cleaning[[i]][,46]))
rownames(tmp_selection) <- c("Above_Thresholds", "Size_Threshold_Only", "Median_Threshold_Only", "ToBeRemoved")
tmpPerC <- round(tmp_selection[,1]/sum(tmp_selection[,1])*100)
tmp_names <- paste(rownames(tmp_selection),tmpPerC, "%", sep=" ")
pie(tmp_selection[,1], labels=tmp_names, main=Names[i], col=rainbow(length(tmp_names)))
}
Lesion area was modeled independently for each species using the linear mixed model with lme4. Plant genotypes are nested within improvement level. Experimental replicate and trays as well as the individuals plants on which were collected the leaves for the detached leaf assay are considered as random factors
Associated file: Lmer_CleanM_7sp_Perc2.txt
# source('~/Desktop/Github_Eudicot/4.Modeling_clean_Data.R')
# This file contains all the mixed models for each 7 crop species.
# ! This code takes hours/days to run. !
Let’s look at the results (Figure 2B).
ModelRes <- read.table(file="Lmer_CleanM_7sp_Perc2.txt", header = T)
ModelRes$Order <- ModelRes$Variable
levels(ModelRes$Order) <- c(8,7, 5,6, 1,4, 2, 3 )
ModelRes$Order <- as.numeric(ModelRes$Order)
color_plot <- c("#762a83", "#e7d4e8", "#c2a5cf", "#762a83", "#5aae61", "#a6dba0", "#fdb863", "#c51b7d")
ggplot(ModelRes, aes(Species, Percent, fill=Variable)) +
geom_col()+
scale_fill_manual(values=c("#542788", "#d8daeb", "#e08214","#1b7837","#b8e186", "#878787","#bababa","#e0e0e0" ))+
geom_text(aes(label=Percent), position=position_stack(vjust=0.5), size=3)+
theme_minimal()
For each plant genotype, model corrected least-square means (LS-means) of lesion area were calculated for each strain from with Emmeans with the Satterwaite approximation.
# source('~/Desktop/Github_Eudicot/5.LSmeans_Genotype.R')
# (Remove the # to run it)
Describe the effects of plant domestication.
Associated file: Eudi_Ara_Lsmeans1_8sp_row_Geno_All_plot.txt
source('~/Desktop/Github_Eudicot/9.ViolinPlot.R')
ggplot(Eudi8sp, aes(Taxon1, LsMeans, fill=Wi_Do)) +
geom_split_violin(scale="width") +
geom_boxplot(width=0.15)+
scale_fill_manual(values=c("#c7eae5", "#000000", "#878787", "#80cdc1"))+
theme_minimal()
In this model,improvement levels and plant genotypes are nested within species to account for the phylogenetic common evolutionary history and possibly shared resistance traits of genotypes with low and high levels of improvement.
Associated file: LSmeanIndP_7spData_row_Geno.txt
source('~/Desktop/Github_Eudicot/7.MetaModel.R')
To provide a global picture of how plant genotypes interact specifically with each Botrytis strains, a clustered heatmap was constructed on standardized LS-means with iheatmapr.
Associated file: Eudi_Ara_Lsmeans1_8sp_Col_Geno_All.txt
Z-scoring the LS-means
source('~/Desktop/Github_Eudicot/6.ZcoreLSmeans.R')
Heatmap:
source('~/Desktop/Github_Eudicot/8.Heatmap.R')
## Bootstrap (r = 0.5)... Done.
## Bootstrap (r = 0.6)... Done.
## Bootstrap (r = 0.69)... Done.
## Bootstrap (r = 0.8)... Done.
## Bootstrap (r = 0.9)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.1)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.3)... Done.
## Bootstrap (r = 1.39)... Done.
## Bootstrap (r = 0.49)... Done.
## Bootstrap (r = 0.59)... Done.
## Bootstrap (r = 0.69)... Done.
## Bootstrap (r = 0.79)... Done.
## Bootstrap (r = 0.89)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.1)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.3)... Done.
## Bootstrap (r = 1.4)... Done.
Eudicot_heatmap_stand
Bootstrapping test on dendrogram branches: Note: Here bootstrapping is only at 200 for a quick demo. Change to 20 000 for final results (expect ~1h of calculation).
plot(PGenost_iso_pv, main="Iso HClust 'complete'")
pvrect(PGenost_iso_pv, alpha = .95)
plot(Noadmixt_genoo_pv, main="Plant Genotypes Hierarchical Clustering method='complete'")
pvrect(Noadmixt_genoo_pv, alpha = .95)
source('~/Desktop/Github_Eudicot/10.CV_Lsmeans.R')
Spe_vir
B05_plot
Dav_plot
KT_Plot
UKR_plot